Dependence Structure Estimation via Copula 
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Abstract 

We propose a new framework for dependence 
structure learning via copula. Copula is a 
statistical theory on dependence and mea- 
surement of association. Graphical models 
are considered as a type of special case of cop- 
ula families, named product copula. In this 
paper, a nonparametric algorithm for copula 
estimation is presented. Then a Chow-Liu 
like method based on dependence measure 
via copula is proposed to estimate maximum 
spanning product copula with only bivari- 
ate dependence relations. The advantage of 
the framework is that learning with empirical 
copula focuses only on dependence relations 
among random variables, without knowing 
the properties of individual variables. An- 
other advantage is that copula is a universal 
model of dependence and therefore the frame- 
work based on it can be generalized to deal 
with a wide range of complex dependence re- 
lations. Experiments on both simulated data 
and real application data show the effective- 
ness of the proposed method. 



1 Introduction 

Dependence between random variables is of fundamen- 
tal importance because it may imply essential statis- 
tical relations within real world social, physical, or bi- 
ological systems. A large amount of data sets are col- 
lected from different fields, such as biology, social net- 
works, finance, world-wide web. Analysis on them re- 
mains a challenge. Hence, dependence structure learn- 
ing is one of the most contributed problems in the ma- 
chine learning community. 

The most well established statistical methodology for 
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dependence representation is graphical models, or 
Bayesian networks (Heckerman et al., 1995; Buntine, 
1996; Jordan, 1998). Through graphical models for- 
malism, a probability density is represented with a 
directed or indirected graph, in which each node rep- 
resents a random variable, and each edge represents 
a conditional dependence relation between two ran- 
dom variables. Therefore, representational simplicity, 
through bivariate dependence decomposition, reduce 
computational complexity and makes large-scale prob- 
lem modeling and inferring tractable. The implicit 
assumption of Graphical models is markovity or con- 
ditional independence, which means only first order 
dependence or pairwise dependence is considered in 
models. But the approach may be improper in many 
cases. 

On the other side, traditional methods on inferring 
graphical models always involve maximum likelihood, 
where we should specify parametric family of entire un- 
derlying distribution, including margins of individual 
variables implicitly. Hypothesis selection on margins 
is central to the performance of structure learning to 
a large extent. But there is short of priori knowledge 
needed for such selection. So we are interested in find- 
ing a method in that can separate structure learning 
from parametric marginal specification. 

Copula theory unifies the representation of multivari- 
ate dependence (Joe, 1997; Nelsen, 1998). The term 
"copula" , come from latin, refers to the way that ran- 
dom variables relate to each other. According to Sklar 
theorem(Sklar, 1959), multivariate distribution can be 
represented as a product of its margins and a cop- 
ula function which represents dependence structure 
among random variables. Using copula, one can sep- 
arate the margins from their joint density distribu- 
tion, and therefore study only statistical interrelations 
without knowing individual properties of each variable. 
Copula has a wide applications in finance(E. Bouye & 
Roncalli, 2000), and recently gain the notice of ma- 
chine learning community(Ma & Sun, 2007; Kirshner, 



2007). 

The main contribution of the paper is introducing a 
novel framework of structure learning based on copula. 
We study estimating dependency structure via cop- 
ula. Particularly, we propose that dependence struc- 
ture is first approximated by empirical copula (or cop- 
ula density) and then fit certain dependence model on 
it. The most advantage of empirical copula is that it 
is a rank-based and model-free non-parametric estima- 
tion of underlying 'true' copula. Based on empirical 
copula estimation, many dependence structures can 
be further adopted and approximately inferred. For 
instance, in this paper, graphical models can be iden- 
tified as a special case of copula. Graphical models 
concerns only pairwise dependence, and has its coun- 
terpart in copula theory, called product copula. We 
propose inferring product copula by Chow-Liu like al- 
gorithm(Chow & Liu, 1968) based on empirical copula 
estimation. Moreover, our methods can be generalized 
to estimate much flexible relationships. 

2 Copula and Copula Space 

2.1 Definition and Properties 

Copulas are the functions that model the dependence 
relations among random variables, and can be defined 
as follows: 

Definition 2.1 (Copula). (Joe, 1997; Nelsen, 1998) 
Given N random variables X = {Xi, . . . , Xn} £ TZ^ . 
Let {ui = Fi{xi), z = 1, . . . , N} be the marginal distri- 
butions of X. A A^-dimensional copula C : T^ -^ X 
{I — [0, 1]) of X is a function with following properties: 

• C is grounded and N-increasing; 

• C(l,.. .,l,Ui,l,. ..,1) = Ui. 

Intuitively, copula can be viewed as a new cumulative 
distribution function (CDF) stretched onto u = T^ 
from the CDF of X. 

The relation between CDF, margins, and copula is 
stated in Sklar's theorem (Sklar, 1959): 

Theorem 2.2 (Sklar's Theorem). Given a random 
vector X = {Xi, . . . , Xn}, its CDF F(x) can be rep- 
resented as 



F(x) =C(Mi,...,UAr), 



(1) 



where C is a copula function, {ui} are marginal dis- 
tribution functions of a.. If {Fi} are continuous, then 
C is unique. 

Sklar's theorem is of fundamental importance in cop- 
ula theory. By applying derivative on equation (1), we 



can also represent probability density function (PDF) 
via copula. But Let us first present a new definition 
named copula density. 

Remark 2.3. According to Sklar's theorem, depen- 
dence structure is dependent from margins. This im- 
plies that it is possible that a same structure is learned 
from different distributions. These distributions are 
said to be equivalent in a sense of copula. 
Definition 2.4 (Copula Density). A N dimensional 
copula density c corresponding to A^-copula C is de- 
fined as 

c(") = -i:-^—j:-C{^). (2) 



dui, . . . , dujq 



where u G Z' 



N 



With the definition of copula density, we can derive a 
corollary of Sklar's theorem: 

Corollary 2.5. The probability density function 
(PDF) p{x) of X can be represented as: 



N 



p{x) = C{U) Y[p,{3 



(3) 



where {pi,i ~ l,...,Af} are marginal density func- 
tions of X, and c is copula density. 

2.2 Copula Space 

As dependence structure representation, copula func- 
tions compose of a convex set enclosed by Minimal 
copula and Maximum copula(Nelsen, 1998). 

How to construct a multivariate copula is of impor- 
tance in applications. Despite Sklar's theorem guar- 
antees the existence of a copula function, it can not al- 
ways be identified as a parametric one. In many cases 
we cannot write down an analytic copula. The follow- 
ing results provides the ways of constructing flexible 
copula representations for multivariate cases. 

2.2.1 Mixture of Copulas 

Theorem 2.6 (Mixture of Copulas). The geometric 
mean of copulas (or copula densities) is also a copula 
(or copula density). 



The theorem can be illustrated as follows: 



c(u) 



K 

E 

fe=i 



WfcCfc(u),. 



(4) 



where c^ represents any type of copula density, and 

J2i=l'^i = l,Wi > 0. 

Remark 2.7. Based on the above result, we can con- 
struct more flexible copula model by mixture of copu- 
las. These copulas being mixed together can be from 
parametric families, or product copulas to be pre- 
sented below(Kirshner, 2007). 



2.2.2 Product Copula 



as following: 



Theorem 2.8 (Product copula). The product of cop- 
ula density of independent variables is also a copula 
density. 



The theorem can be illustrated as 

M 
c(u) = Y[ Cm{u,n). 



(5) 



C(|,...,^)-^^I[.*<U,n=l,---,A^- (7) 



where I denotes indicator function. 

Using forward difference on lattice, empirical copula 
density can be derived in a same way: 



where {cm} are any type of copula density, and u — 
^m=i^m, and {ujn} are vectors of marginal functions 
of random variables. If all the sub-copulas c^ are bi- 
variate, it means that there is only pairwise depen- 
dence exists. In this case, product copula is equal to 
a graphical model. 

Theorem 2.9. Any graphical models equals to a prod- 
uct copula with only bivariate sub-copulas. 

The theorem indicates that graphical model is just a 
special case of product copula. More generally, hy- 
pergraph can also be formulated to be a special case 
of product copula with each variable dimension sub- 
copula corresponding to a sub-graph. 

3 Estimation Methods via Empirical 
Copula 

A large mount of inference method for copula can 
be summarized as follows: starting with a paramet- 
ric family of copula, either implicitly implied by pdf 
or explicitly specified, and then optimizing parame- 
ters under the maximum likelihood framework. Using 
nonparametric method will help us avoid the risk of 
parametric model family when no priori knowledge is 
available. In this section, we introduce empirical cop- 
ula (density) estimation algorithm. To the best of our 
knowledge, there is no such works before. 

3.1 Empirical Copula 

Empirical copula was introduced by Deheuvels (De- 
heuvels, 1979; Deheuvels, 1981). It approximate the 
copula or copula density of samples based on order 

statistics. 



■T^NJ •= 



Consider a i.i.d. sample set X* — {x^^, . . . , j.^j 
TZ^ ,t ~ [l,r]. Let {x„ } be order statistics and the 
corresponding rank 1 < rl^ < T so that x^" ~ x^. 

Definition 3.1 (Empirical Copula). An Empirical 
copula C of samples {X*, t — 1, . . . , T} is defined on a 
(r+ 1) lattice 
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(8) 



3.2 Estimation Algorithm 



(6) 



Based on the definition (7), it is not hard to present 
the estimation algorithm (see algorithm 1) of empir- 
ical copula, given a group of samples. According to 
equation (8), the algorithm of empirical copula den- 
sity function is just a accumulative process based on 
algorithm 1. 

The algorithm 1 has the linear time complexity 0{TN) 
while the algorithm 2 0{TN * 2^) a exponential one. 
Notice that both algorithms are based on samples. 
We can calculate empirical value of both functions as 
search table in advance, which can reduce the calcula- 
tion time while applications. 

Algorithm 1 Empirical Copula Function C 

Input: data Xi, dimension N, size T, u G I^ 
for n = 1 to A^ do 

r„ == Rank(a;„) 
end for 

m — 0, Un — Un * T 

for f = 1 to r do 

Initialize n ~ 1 
while Un < r^ do 

n = n + 1 
end while 
if n = A^ + 1 then 

m = m + 1 
end if 
end for 
Output: C(u) = m/T 



3.3 As a universe basement 

Using empirical copula when estimating dependence 
structure have many advantages. First, with nonpara- 
metric empirical copula algorithm, we can estimate dif- 
ferent dependence relations from data in a model-free 
way. Second, copulas are invariant under monotoni- 
cally increasing transformation so we do not have to 



Algorithm 2 Empirical Copula Density Function c 

Input: data x^, dimension A^, size T, u G I^ 

c(u) == 

for allt e [1,2]^ do 



c(u) = c(u) - 
end for 
Output: c(u) 



C(u-i^) 



normalize data during analysis. Third, copulas are in- 
sensitive to outliers. 



product copula ''''maximum spanning copula^^ (MSC). 
Therefore, we transform structure learning into a fit- 
ting problem. 

In this paper, we focus on MSC by product copula. 
The MSC approximation of c in terms of bivariate 
product copula (or graphical model) composes of a 
product of iV — 1 bivariate copula. Then the above 
F can be defined by the sum of dependence measure- 
ments on A'^ — 1 subcopulas. The problem is how to 
find such a optimal product copula to approximate the 
dependence structure among random variables. 



4 Maximum spanning copula 

estimation by Chow-Liu algorithm 

In this section, we want to go further step to include 
structure learning by graphical model into our copula 
framework. As previously stated. Graphical model is 
a special case of copula. The dependence relation rep- 
resented by edges in graph is equal to a product of a 
group of bivariate copulas. In this section, we propose 
inferring such product copula from data by Chow-Liu 
algorithm based on empirical copula estimation. No- 
tice that copula has all the dependence information of 
random variables. Inferring only product copula from 
data is just a way of approximating the 'true' under- 
lying copula. 

4.1 Maximum spanning copula problem 

We want to estimate the dependency structure with- 
out being bothered by individual variables' properties. 
Dependence is measured by copula, if possible, only 
by copula such that we don't have to make additional 
assumptions on and inference the parametric form of 
individual variables under the risk of unfit models. 

Suppose we want to approximate dependence relations 
with a type of structure T(t), where t is the param- 
eter specifying T. Given a group of i.i.d. samples 
X generated from a N dimensional random vector 
X e TZ"^ pix), an objective function F can be defined 
on it, and be optimized to inference T with respect to 
t: 

(9) 



mm F{t;X). 



In many works, objective function F is defined through 
maximum likelihood principle, which requires para- 
metric assumptions on multivariate density functions 
p{x). 

Now consider a N dimensional copula density c of x. 
We can derive its empirical estimation c based on X, 
which contains all the dependence information in data. 
One of the natural idea is to cover the most depen- 
dence relations with product copula. We call such 



4.2 Dependence measure 

First, we should choose a dependence measurement by 
copula. 

4.2.1 Statistical measures of independence 

Copula summarizes all the dependence relations. 
Hence it is natural to link it with the proposed de- 
pendence measures in statistics. It has been proved 
that measures, such as Kendall's tau. Spearman's rho, 
Gini's gamma, can be calculated through only copula 
function (Nelsen, 1998). Using empirical copula (den- 
sity) to approximate copula (density), we can calculate 
these measures approximately. For example, an esti- 
mation of Spearman's rho based on empirical bivariate 
copula is 



12 



P = 



2^2 



lEE 






(10) 



ti = lt2 = l 

where T is the order of lattice. 

4.2.2 Mutual Information 

Mutual Information (MI) is dependence measure in 
information theory (Cover & Thomas, 1991). Due to 
copula density is actually a density on I^ , MI can 
be used to measure the divergence between the 'true' 
densities and its estimations. 

f ui X v] 

Hx,y)= p{x,y)log ' dxy. (11) 

Jx,y P[x)p(y) 

The equation (11) can be transformed into a copula 
density representation: 



I{x,y)= / p{x)p{y)c{u^,Uy)\ogc{u^,Uy)dxy, 

J x,y 

(12) 
where u denotes marginal distributions. Given a group 
of data (a;i,a;2), we can calculation MI as 



I{x,y) = ^ p{xi)p{x2)c{ux,Uy)\ogc{u^,Uy). 
(X1.X2) 



(13) 



In this formulation, besides empirically estimated cop- 
ula density, univariate marginal densities are the func- 
tions to be estimated, for which there are many well- 
established methods, such as naive estimator, k-NN, 
kernel methods, etc.. (Silverman, 1986). For this prob- 
lem, we adopt gaussian kernel estimator due to its eas- 
ily calculation of both density and its derivative. 

4.3 Construction algorithm of product 
copula 

We propose approximating from samples their copula 
(density) in form of product copula (density). First, 
based on dependence measure matrix, an complete 
graph Q on N random variables is built where the 
weight of each edge is equal to dependence between two 
variables. Constructing optimal product copula equals 
to finding maximum spanning tree }. This is a well- 
defined problem, which can be solved by Chow-Liu al- 
gorithm (Chow & Liu, 1968). Chow-Liu algorithm is 
actually an algorithm for constructing maximum span- 
ning tree with MI as edge weights. There are some 
established algorithms, such as Kruskal's algorithm 
(Kruskal, 1956) and Prim's algorithm (Prim, 1957). 
Both algorithms can find the solution in polynomial 
time. Wc adopt Prim's algorithm in our method. It 
starts with an edge set E containing only the maxi- 
mum weight edge, and then each time add from the 
complement set of E one vertex u and its correspond- 
ing edge (m, v) with maximum weight such that v ^ E 
has edge connection with E and (u, v) will makes no 
loop in new _E, till E contains all the vertex, in the 
case of complete graph also means A^ — 1 edges. 

4.4 Algorithm 

We give the whole algorithm in the section, which com- 
poses of three steps: 

Algorithm 3 Estimating dependence structure via 
copula 

Input: data x^, dimension N , size T, u € I^ 
Construct empirical copula density c by algorithm 

2; 

Calculate MI matrix Mx of x by Equation (13); 
Build dependence tree T by Chow-Liu algorithm 
based on Mx. 



4.5 Related to Density Estimation 

If we want to estimate not merely dependence struc- 
ture but the whole underlying density, nodes and edges 
in graph should be parameterized after graph is de- 
rived for dependence structure. Because copula is sep- 
arated margins from distribution, density estimation 



here can be achieved by estimation on marginal addi- 
tional to structure learning. What we should do is to 
estimate one dimensional margins of individual vari- 
ables. There are many established methods on this 
issue, which is however beyond the topic of this paper. 

5 Experiments and Results 

5.1 Simulated data 

We perform our method on a group of simulate mul- 
tivariate data to investigate the effectiveness of our 
method. A dataset with 1000 samples are randomly 
generated from a 5 dimensional distribution of a ran- 
dom vector, of which the first three elements are zero 
mean Gaussian and the others two are governed by 
Gaussian copula with margins as normal distribution 
and exponential distribution respectively. Due to only 
gaussian and gaussian copula exists, we measure de- 
pendence with Spearman's rho using Equation (10). 
Based on estimated measurements, a graph is expected 
to be estimated where three gaussian variables and two 
variables coupled by copula are grouped together re- 
spectively. 

The algorithm 3 is run on the dataset and then a em- 
pirical copula representing dependence relations be- 
tween random variables are estimated which is illus- 
trated in Figure 1. Based on it, we derived a approxi- 
mate dependence tree as illustrated in figure 2. 

5.2 Real datasets 

The success of simulation experiments on toy data can 
be easily anticipated. We perform our method on two 
real datasets: abalone and housing from UCI machine 
learning repository (Asuncion & Newman, 2007) to 
study their inner dependency structure. Both datasets 
are complete and have continuous and discrete at- 
tributes. 

5.2.1 Abalone 

Abalone dataset was built to predict the age of abalone 
based on physical measurements of abalone body, such 
as weights, height. It composes of 4177 samples with 
8 attributes. It can be viewed as a regression problem 
where some measurements are possibly intrinsically in- 
terrelated. Instead of predicting the age, we focus on 
the dependence relations among attributes, which may 
benefit the prediction task. 

There are a few outliers in the dataset. They are usu- 
ally eliminated by pre-processing step otherwise they 
may cause large deviation in the following dependence 
measures calculation. But here it is unnecessary be- 
cause copula is less susceptible to outliers. This makes 



-3-11 3 









G3 






Cn 





0.0 1.5 3.0 



Figure 1: Samples in the simulated data 

experiment. 'Gl-3' represent Gaussian, and 'Cn','Ce' 
represent two copula variable with normal and 
exponential margins. 



copula more advantageous than other moment-based 
dependence measures sensitive to outliers. When esti- 
mating empirical copula in the experiment, we set the 
order of lattice with different size empirically consid- 
ering a trade-off between approximation accuracy and 
computational cost. We choose Spearman's rho, which 
is calculated by equation (f 0), and MI as distance mea- 
surements. During MI estimation, kernel method with 
well-tuned parameters was applied on different mod- 
erate sized subsets randomly sampled from the whole 
dataset. The estimation value varies a little. Due to 
space limitation, we will gives no details about that. 
Then with these two types of dependence measures as 
weights, MSP trees were built. 

The original datasetare are plotted in Figure 3. For il- 
lustratory propose, we only present a subset attributes 
containing four attributes. During empirical copula es- 
timation, the effect of outliers diminishes, which can 
be easily learned from a comparison between Figure 3 
and Figure 4. 

Besides robustness to outliers, we emphasis another 
effect made possible by copula that dependence rela- 
tions can be successfully revealed by the estimated 
copula. It can be observed from Figure 3 that all 
the attributes possesses non-gaussianity to some ex- 
tent which is demonstrated in their joint densities with 
other attributes. While all the pairwise estimated cop- 




Figure 2: Dependence tree estimated in the simulated 
experiment. 



ulas seems to show a very similar dependency structure 
after individual properties of variables are believed to 
be separated away from joint distribution. 

Figure 5 shows one of all the maximum spanning trees 
for abalone in our experiments, where edges are la- 
beled by weights. Except Sex and rings, seven other 
attributes are linked with relatively strong weighted 
edges, which is illustrated in Figure 4. It can also 
be learned that the edges linked the seven physi- 
cal measurements are the backbone of all the esti- 
mated trees, while the nodes for "sex" and "ring" are 
leaves randomly atteched to this seven nodes. This 
can be interpreted as the reflection of abalone's body 
growth. That is, all the physical indexes increase as 
the abalones grow up, while ring and sex is not strongly 
related with these seven physical attributes. We ar- 
gue that predicting ring with the other attributes in 
abalone dataset may not be a good experimental de- 
sign. 

5.2.2 Housing 

The Boston house price dataset is from a 1970 census, 
first published by Harrison, D. and Rubinfeld (Harri- 
son & Rubinfeld, 1978), with the aim to study how 
to predict "Medv" ^ based on the 13 attributes. It 
contains 506 samples, with 14 mixed type attributes, 
including 13 continuous attributes and 1 binary one. 
Previous research mainly treat it as a regression prob- 
lem where the interrelation between attributes are ig- 
nored. In our experiment, we studied the whole depen- 
dence structure instead. Using copula to estimate de- 
pendence relations and to generate a maximum weight 
tree, we find some unnoticed relations between the at- 



^The abbr. of 14 attributes of Housing dataset refer 
to UCI machine learning dataset website. (Asuncion & 
Newman, 2007) 
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Figure 3: Scatter plot of four attributes in Abalone 
dataset.'L','H','swVvw' represent Length, Height, 
Shucked weight, and Viscera weight. 



Figure 4: The estimated empirical copula of four at- 
tributes in Abalone dataset. For names of variables, 
see Figure 3 



tributes. 

Some researchers propose to transform the data into 
a suitable scale before further dependence analysis, 
through monotonically increasing function, such as 
normalization, nonlinear exponential/log functions. In 
our experiment, it is unnecessary due to copulas in- 
variant to such kind of transformation. As the pre- 
vious abalone experiment, MSP algorithm was run on 
the moderate datasets randomly sampled from housing 
dataset. Many dependence trees were generated, one 
of which is plotted in Figure 7. Experimental results 
indicate that only two link edges including "crim-rad" 
and "medv-lstat" , remain stable in all the estimated 
trees. We also observed that there are two group of at- 
tributes ^ are weakly interconnectted to some extent. 

6 Discussions 

Our philosophy on structure learning is that the more 
we know, the better structure we can learn. Learn- 
ing by graphical models has its limitations because 
it is based on only first order dependence relations. 
This kind of relations are from empirical causality and 
is suitable for understanding simple mechanism, but 
probably fail to complicated situations. Using copula, 
one can incorporate all the dependence information 



^One includes "nox,dis, Indus, crim" 
"medv,lstat,age,ptratio" . 



the other includes 



without model constraints and meanwhile all the infor- 
mation is nothing else but dependence relations. Then 
structure learning based on copula provides a general 
framework which can unify all the related structure 
learning methods. The main advantage is its non rel- 
evance to particular properties of individual variables. 

As we remarked above, different densities may have 
same copulas. In this paper, the goal of structure 
learning based on copula is to maximize a total sum 
of certain dependence measures of structure to span 
the structure as large as possible. The same task can 
be achieved by maximum likelihood method, fitting in 
a sense of mean-squared error. The generalization of 
tree structure to more complex structure, such as clus- 
ters, or hypergraph can be done in a similar way. The 
type of the structure may be determined on prioris and 
application backgroups. 

Dependence representation using only bivariate depen- 
dence is limited. Given N(N — 1) pair dependence re- 
lations of N random variable, only A^ — 1 of relations 
compose of tree approximation. To examine the de- 
gree of approximation, we propose a criteria by a ratio 
of total dependence relation of tree ratio to the sum of 
all the N(N — 1) relations. The ratio of all the exper- 
iment are plotted in Figure 8. It can be learned that 
such tree approximation possesses a large portion of 
the total weight with relatively few edges meanwhile 
there is also a large amount of dependence relations 





Figure 5: Maximum spanning copula generated from 
the estimated empirical copula of Abalone dataset. 



Figure 6: Maximum weight tree with correlation as 
weight, generated from Housing dataset. 



not included in it. This indicates that through capa- 
ble of grasp some major first-order dependence/causal 
relations, bivariate dependence structure itself has a 
limited capability of dependence modeling on complex 



7 Conclusions and Further Directions 

In the paper, we propose estimating dependence struc- 
ture using copula method. Copula can represent all 
kinds of dependence relations among random vari- 
ables, and makes no additional assumption on the 
underlying distributions. Graphical models is a spe- 
cial case in copula family named product copula. A 
ranked-based algorithm for copula estimation is pre- 
sented, such that copula function is separated from 
the joint density with properties of individual vari- 
ables. Such a nonparametric method can provide very 
large freedom to structure learning in that the esti- 
mated empirical copula contains all the dependence 
information in the data. Then we study learning prod- 
uct copula or tree structure based on empirical cop- 
ula, a Chow-Liu like method based on empirical cop- 
ula is proposed to estimate maximum spanning prod- 
uct copula with only bivariate dependence relations. 
The proposed method was applied on simulated data 
and two real dataset to approximate their dependence 
structure. Experimental results show that copula can 
achieve a margin-free dependence representation, ro- 
bust to outlier, and invariant to increasing transforma- 



tion and the estimated product copulas can benefit us 
understanding the underlying dependency structure. 

Though widely applied to financial problems in the 
past few decades, copula theory itself is still probably 
in its infancy and gradually gaining the attention of 
many related fields. The most significance of copula 
to machine learning is that it provides a more gen- 
eral way of dependence representation, measurement, 
and inference than the previous proposed models. In 
this paper, we only contribute a little on this issue 
compared with copula's potential. As far as struc- 
ture learning is concerned, many problems remains for 
copula methods. For example, how to choose copula 
model(or family when in a parametric way) for dif- 
ferent applications? How to design copula model for 
particular dependence relations? Product copula rep- 
resents the simplest dependence relations. In further 
work, we suppose to study the estimation of more com- 
plex dependence structures through copula. This is 
very important to many real applications in biology, 
finance, and social science. 
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